Make is_valid_syntax parts lazy loaded by zibertscrem · Pull Request #14 · willynilly/rfc3987-syntax

zibertscrem · 2026-04-20T12:27:18Z

This PR is about optimizing performance of the main module import (syntax_helpers one). I believe there are other PRs with the same goal but my fork has been already tested in real environment

Why do I care

So, I develop a big data platform with orchestration based on airflow, and some of the airflow components start painfully slow. Specifically, api server and scheduler. And because of painfully slow startup sometimes on smaller environments I experience constant service restartings after re-deploys cause it cannot make initial start on time (which pisses me off infinitely)

When I performed profiling of the startup of those components - I've noticed that a lot of time was spent inside initial loading of jsonschema package which I believe airflow uses for enforcing their SDK API schema format

Then digging deeper I found that what's actually making things stuck is this package. Airflow spins up N parallel processes and they all go and try to import jsonschema which leads to importing this package N times on pretty small machine and it leads to freezes that cause k8s to kill them and reinitiate

What did I do

I just put everything that made sense under cached functions and added module-level getter for original variables and it shrinked loading time by a lot. I already tested it in my platform and it seems working just fine with jsonschema package. I don't know if any other library uses it differently but I tested what I cared for, the optimized version has been deployed to the dev environment of my platform and worked just fine. I also fixed a typo in RFC3987_SYNTAX_TERMS var (it lacked comma between terms)

I originally thought to put all the names into __all__ but then asterisk import from __init__ actually forces to load everything and it basically nukes all my lazy-loading efforts. Maybe it makes sense to put all the original names somewhere cause RFC3987_SYNTAX_TERMS variable actually lacks some of the function names that were presented in a module

Conclusion

If you want to me to change anything in the lazy load approach - pls let me know, I try to fit it in. Anyways, I have a good opportunity to test if I break anything in real env. I hope you find this PR good enough so we can have an optimized module loading. I believe a lot of people are pissed off because of such very deep dependency's loading time.

Thanks for bringing this library to the world anyways. I made these changes without AI and tested them manually and deployed on real environment. Yeah, it's dev environment but still is pretty real one

Copilot

Pull request overview

This PR aims to reduce import-time overhead for rfc3987_syntax.syntax_helpers by lazily initializing the Lark grammar/parser and validator helpers, targeting faster startup for consumers that import the module indirectly (e.g., via jsonschema).

Changes:

Lazily load grammar and the main Lark parser via @lru_cache-backed getters.
Replace eagerly-created is_valid_syntax_* module globals with module-level __getattr__ that creates validators on demand.
Fix a missing comma in RFC3987_SYNTAX_TERMS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-30T11:18:55Z

+        term = name.removeprefix("is_valid_syntax_")
+        return make_syntax_validator(term)
+    msg = f"Attribute {name} not found"
+    raise AttributeError(msg, name=name)


raise AttributeError(msg, name=name) is not a valid way to construct AttributeError (it doesn't accept keyword arguments). This will raise a TypeError when an unknown attribute is accessed. Raise AttributeError(msg) (or AttributeError(msg, name) positionally if you want .name set) instead.

Suggested change

raise AttributeError(msg, name=name)

raise AttributeError(msg)

Copilot · 2026-04-30T11:18:55Z

+        return get_syntax_parser()
+    if name.startswith("is_valid_syntax_"):
+        term = name.removeprefix("is_valid_syntax_")
+        return make_syntax_validator(term)


__getattr__ currently treats any is_valid_syntax_* attribute as valid and passes the suffix straight into make_syntax_validator(). Combined with @lru_cache(maxsize=None) this allows unbounded creation/caching of parsers for arbitrary attribute names (and can also surface confusing Lark configuration errors for typos). Consider checking term against RFC3987_SYNTAX_TERMS and raising AttributeError for unknown terms.

Suggested change

return make_syntax_validator(term)

if term in RFC3987_SYNTAX_TERMS:

return make_syntax_validator(term)

Copilot · 2026-04-30T11:18:56Z

+__all__ = [
+    "is_valid_syntax",
+]


There are behavior changes here around lazy module attributes (grammar, syntax_parser, is_valid_syntax_*) and around what gets exported at package import time. Existing tests only exercise is_valid_syntax via import rfc3987_syntax as h, so regressions like missing re-exports (parse, RFC3987_SYNTAX_TERMS, is_valid_syntax_iri, etc.) or __getattr__ behavior won't be caught. Add tests that assert these public attributes are available and callable after importing rfc3987_syntax.

Copilot · 2026-04-30T11:18:56Z

+__all__ = [
+    "is_valid_syntax",
+]


Defining __all__ as only ['is_valid_syntax'] changes what from .syntax_helpers import * exports. Since src/rfc3987_syntax/__init__.py uses a star import, this will stop re-exporting parse, RFC3987_SYNTAX_TERMS, and the documented is_valid_syntax_* helpers (breaking the public API/README examples). Consider removing the star import from __init__.py and explicitly re-exporting symbols (possibly via package-level __getattr__), or otherwise ensuring the previous public surface remains available without eagerly constructing all validators on import.

Make syntax parts lazy loaded

27c718f

zibertscrem mentioned this pull request Apr 20, 2026

Import time improvements #7

Open

jkowalleck mentioned this pull request Apr 23, 2026

[IDEA] Prepare and Publish Maintained Fork to PyPI jkowalleck/fork_rfc3987-syntax#1

Closed

26 tasks

jkowalleck requested a review from Copilot April 30, 2026 11:14

Copilot started reviewing on behalf of jkowalleck April 30, 2026 11:15 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make is_valid_syntax parts lazy loaded#14

Make is_valid_syntax parts lazy loaded#14
zibertscrem wants to merge 1 commit into
willynilly:mainfrom
venomlab:main

zibertscrem commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	raise AttributeError(msg, name=name)
	raise AttributeError(msg)

	return make_syntax_validator(term)
	if term in RFC3987_SYNTAX_TERMS:
	return make_syntax_validator(term)

Conversation

zibertscrem commented Apr 20, 2026

Why do I care

What did I do

Conclusion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants